Formula-E race strategy development using distributed policy gradient reinforcement learning
نویسندگان
چکیده
Energy and thermal management is a crucial element in Formula-E race strategy development. In this study, the race-level development formulated into Markov decision process (MDP) problem featuring hybrid-type action space. Deep Deterministic Policy Gradient (DDPG) reinforcement learning implemented under distributed architecture Ape-X integrated with prioritized experience replay reward shaping techniques to optimize set of actions both continuous discrete components. Soft boundary violation penalties shaping, significantly improves performance DDPG makes it capable generating faster finishing solutions. The new proposed method has shown superior comparison Monte Carlo Tree Search (MCTS) policy gradient learning, which solves fully space as presented literature. advantages are time better handling ambient temperature rise.
منابع مشابه
Using policy gradient reinforcement learning on autonomous robot controllers
Robot programmers can often quickly program a robot to approximately execute a task under specific environment conditions. However, achieving robust performance under more general conditions is significantly more difficult. We propose a framework that starts with an existing control system and uses reinforcement feedback from the environment to autonomously improve the controller’s performance....
متن کاملOnline Policy-Gradient Reinforcement Learning using OLGARB for SpaceWar
The goal of this project is to explore the use of reinforcement learning techniques to address a difficult problem domain. The SpaceWar task is a competitive, continuousvalued, partially observable problem domain that provides a fun and interesting challenge for machine learning algorithms. This project focuses on the application of online policy-gradient reinforcement learning techniques to tr...
متن کاملScalable Multitask Policy Gradient Reinforcement Learning
Policy search reinforcement learning (RL) allows agents to learn autonomously with limited feedback. However, such methods typically require extensive experience for successful behavior due to their tabula rasa nature. Multitask RL is an approach, which aims to reduce data requirements by allowing knowledge transfer between tasks. Although successful, current multitask learning methods suffer f...
متن کاملModel-based Policy Gradient Reinforcement Learning
Policy gradient methods based on REINFORCE are model-free in the sense that they estimate the gradient using only online experiences executing the current stochastic policy. This is extremely wasteful of training data as well as being computationally inefficient. This paper presents a new modelbased policy gradient algorithm that uses training experiences much more efficiently. Our approach con...
متن کاملInverse Reinforcement Learning through Policy Gradient Minimization
Inverse Reinforcement Learning (IRL) deals with the problem of recovering the reward function optimized by an expert given a set of demonstrations of the expert’s policy. Most IRL algorithms need to repeatedly compute the optimal policy for different reward functions. This paper proposes a new IRL approach that allows to recover the reward function without the need of solving any “direct” RL pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Knowledge Based Systems
سال: 2021
ISSN: ['1872-7409', '0950-7051']
DOI: https://doi.org/10.1016/j.knosys.2021.106781